Overview of the International Authorship Identification Competition at PAN-2011
نویسندگان
چکیده
This paper gives an overview of the evaluation methodology applied to authorship identification solutions as part of PAN 2011. The two variations of authorship identification that were explored were authorship attribution, determining which of a known set of authors wrote a text, and authorship verification, determining if a specific authors did or did not write a text. We summarize the methods used by the various participants, which were quite varied, and present the overall results of the evaluation.
منابع مشابه
Authorship Identification with Modality Specific Meta Features - Notebook for PAN at CLEF 2011
This paper presents the approach used in the PAN ’11 authorship identification competition. Our method extracts meta features from several independently generated clustering solutions from the training set. Each clustering solution uses a disjoint set of features that represent a specific linguistic modality. The different clustering solutions encode similarities in writing styles of authors ac...
متن کاملAuthorship Identification in Large Email Collections: Experiments Using Features that Belong to Different Linguistic Levels - Notebook for PAN at CLEF 2011
The aim of this paper is to explore the usefulness of using features from different linguistic levels to email authorship identification. Using various email datasets provided by PAN’11 lab we tested several feature groups in both authorship attribution and authorship verification subtasks. The selected feature groups combined with Regularized Logistic Regression and One-Class SVMmachine learni...
متن کاملAuthorship Verification Using the Impostors Method Notebook for PAN at CLEF 2013
This paper describes the evaluation of the GenIM method, which participated in the PAN' 13 authorship identification competition. The approach is based on comparing the similarity between the given documents and a number of external (impostor) documents, so that documents can be classified as having been written by the same author, if they are shown to be more similar to each other than to the ...
متن کاملAuthorship Identification of E-mail as a Multi-Class Task - Notebook for PAN at CLEF 2011
In this paper, we describe a multi-class text categorization approach to authorship attribution and test it on sets of e-mail collections. The PAN 2011 competition data consists of e-mails of variable length, written by various candidate authors, with some represented by significantly longer or more e-mails than others. Rather than construct a classifier for each separate author to discriminate...
متن کاملEPSMS and the Document Occurrence Representation for Authorship Identification - Notebook for PAN at CLEF 2011
This paper describes the participation of the PISIS team in the authorship identification track of PAN’11. We adopted two different strategies for the tasks of authorship attribution and authorship verification. For authorship attribution we performed experiments with a document occurrence representation using a standard classification-based approach. Results obtained with this approach were mi...
متن کامل